Ng Kian Wei

Data Science | Medical Imaging | Mixed Reality

About  /  Blogs & Tutorials  /  CV  /  Linkedin

Practical Signal Processing (Part 1): Frequencies and Filters


A typical data science project involves wrestling and teasing out meaningful information from real-world data. The end result can take the form of beautiful charts provided in dashboards to end-users, or in the form of probabilistic models that can sieve through the noise and predict based off underlying trends.

Often, the trend that interests you would be hidden amongst noise. Noise here can refer to actual "junk" interference (e.g. line noise in cables, grainy camera footage), or it could actually be pretty interesting phenomena that just so happens to be a hindrance when mixed in to your current task ("one man's trash is another man's treasure")!

One pretty useful way to look at new data/problems is from the frequency perspective. This is especially useful for cases where the data has an inherent "ordered" characteristic (time, space etc).

This post will not cover the deep math and theories that come with a typical undergraduate EECS course, but will instead go through the intuitions and basic practical uses of such techniques.

We start off with a simple time-series weather dataset, grabbed from here. Off the bat, we can see that there's quite a bit of variation day-to-day, along with a slower moving trend from across the months/years. Which is the signal and which is the noise?

Well, that depends on what you're interested in extracting! For example, if your task is to figure out which is the best month to ... boil an egg outdoors(?), you'll probably be interested in the slower month-to-month trend, instead of the daily fluctuations. Conversely, if you're interested to see if commuting patterns are influenced by the weather, you'll want to keep an eye on the daily fluctuations.

Either way, you'll be interested in separating the high frequency (daily) components of the signal from the low frequency (seasonal) ones, after which you can choose to discard one or the other. Filter design can get quite math-y, so let's start with a simple moving average (SMA) filter instead!

In essence, SMA filters are defined by just one hyperparameter -- window length. Here, we show an illustration of a 5-term SMA filter being applied on the top time series, resulting in the bottom filtered output. At each step, the filter coefficents (in this case all 1/N) are multiplied with the corresponding input signal value (top row), resulting in the intermediate decimal values shown. These are then summed up, resulting in the filtered output values (bottom row). This multiplication and summation chain forms the basis of convolution.

Here, we normalize the terms to 1/N such that the resulting output is basically an average of the 5 input values -- hence the name Simple Moving Average. Because the output takes into consideration several neighboring input values, it "filters away" some of the high frequency components (notice how the extreme values 49 and 10 got pulled to the 20-30s range?) while preserving the overall trend. This makes it the simplest Low-pass filter in existence!

(Note 1: in the illustration, the output sequence is truncated at the head and tail. This is because our 5-term filter would ideally be matched with 5 valid terms from the input sequence for a proper "averaging" operation. In practice, this can be partially adddressed by padding the input signal before the convolution.)


What are some basic considerations/adaptions for these types of windowed filters?

  • Filter length -- Playing with the slider above, it's easy to see that the filter's effect can be drastically altered by increasing the filter length (in essence the filter's influence). The trade-off should be very simple to grasp, but just be mindful of bad-data propagating (from the padded extremities or nan-values -- see Feb 2023 area!)
  • Filter coefficients -- Instead of having a constant 1/N for all terms, careful selection of values could lead to different use cases. Some examples include Gaussian Filters, Expoential Moving Averages (examples in finance, deep learning).


In fact, while the graphed example showcase filters that preserve only low-frequency components, there can be coefficients that extract/respond to high frequency components too (see 4th plot above). In an effort to keep things bite-sized, that's reserved for another post (to be coupled with some image processing stuff), but in the meantime, the SMA filter is already enough to extract basic high frequency data -- just subtract the filter output from the original signal!


All material original unless otherwise stated. Plots and figures generally made in Python -- numpy/opencv/plotly/matplotlib